Multichromosomal Mitochondrial Genome of Paphiopedilum micranthum: Compact and Fragmented Genome, and Rampant Intracellular Gene Transfer

Orchidaceae is one of the largest families of angiosperms. Considering the large number of species in this family and its symbiotic relationship with fungi, Orchidaceae provide an ideal model to study the evolution of plant mitogenomes. However, to date, there is only one draft mitochondrial genome of this family available. Here, we present a fully assembled and annotated sequence of the mitochondrial genome (mitogenome) of Paphiopedilum micranthum, a species with high economic and ornamental value. The mitogenome of P. micranthum was 447,368 bp in length and comprised 26 circular subgenomes ranging in size from 5973 bp to 32,281 bp. The genome encoded for 39 mitochondrial-origin, protein-coding genes; 16 tRNAs (three of plastome origin); three rRNAs; and 16 ORFs, while rpl10 and sdh3 were lost from the mitogenome. Moreover, interorganellar DNA transfer was identified in 14 of the 26 chromosomes. These plastid-derived DNA fragments represented 28.32% (46,273 bp) of the P. micranthum plastome, including 12 intact plastome origin genes. Remarkably, the mitogenome of P. micranthum and Gastrodia elata shared 18% (about 81 kb) of their mitochondrial DNA sequences. Additionally, we found a positive correlation between repeat length and recombination frequency. The mitogenome of P. micranthum had more compact and fragmented chromosomes compared to other species with multichromosomal structures. We suggest that repeat-mediated homologous recombination enables the dynamic structure of mitochondrial genomes in Orchidaceae.


Introduction
The mitochondrion is a key organelle involved in a series of cellular processes. Angiosperm mitochondrial genomes (mitogenomes) are characterized by a low mutation rate, a highly dynamic genome structure, extensive variation in genome size, long non-coding regions, frequent recombination, RNA editing, and widespread horizontal gene transfer [1][2][3][4][5][6][7]. For instance, the smallest mitogenome is found in the hemiparasitic Viscum scurruloideum, with a length of 66 kb [8], whereas the mitogenome of Silene conica has expanded to 11.3 Mb [9]. Earlier studies have indicated that plant mitogenomes exist as circular structures. However, there is increasing evidence that the genomic conformation of mitogenomes can be more complex than just one circular structure. Electron micrographs of the mitochondria of Chenopodium album have shown a subgenome that is circular with a linear tail [10], while some species even show a complex branched structure [11], and a multichromosomal structure has been independently identified in multiple lineages [3,9,[11][12][13][14][15][16][17][18][19]. For example, the mitogenome of S. conica consists of 128 circular chromosomes ranging in size from 44 kb to 163 kb [9]. The number of mitochondrial chromosomes ranges from 2 in multiple species to 132 in Picea glauca [17]. Furthermore, the mitochondrial genes have shown a disparity in their substitution rates [20][21][22], and the synonymous substitution rate in Ajuga has shown a 340-fold recombination frequency of the repeat pairs and tested the relation between repeat length and the recombination frequency.

The Multichromosomal Structure of the P. micranthum Mitogenome
The mitogenome of P. micranthum was assembled into 26 circular chromosomes with lengths ranging from 5973 bp to 32,281 bp, with a total length of 447,368 bp ( Figure 1). The average GC content of the P. micranthum mitogenome was 44.4%, ranging between 40.4% and 49.2% among chromosomes (Table 1). We obtained, for most of the chromosomes, a sequencing depth above 40× for the long reads and 500× for the short reads (Table S1). Both long-and short-read assemblies were almost identical, except for Chr5 (20,211 bp), which existed as one circular sequence in the short-read sample and fragmented into Chr5A (9033 bp) and Chr5B (11,178 bp) in the long-read sample. Both minicircles were supported by 32 and 102 long reads, respectively ( Figure S1). The mitogenome of P. micranthum encoded 70 genes, including 39 mitochondrial protein-coding genes, 12 plastome-derived proteincoding genes, 16 tRNA genes, and three rRNA genes (rrn5, rrn18, and rrn26) ( Figure 2, Table 2). Further, 16 ORFs coding for hypothetical proteins with BLAST hits were preserved ( Figure 1, Tables 2 and S2). In addition to the copy of rrn5 on Chr2, which is identical to the one annotated in G. elata [37], the copy on Chr22 was truncated at the 5' end (88 bp) and relatively shorter than the normal one. Each chromosome had one to four genes, whereas Chr18, with a length of 14,612 bp, was devoid of functional genes ( Figure 1). Further, the "empty" sequence presented no significant similarities to the sequences in GenBank.

Repeat Sequences in the Mitogenome of P. micranthum
Overall, 27 tandem repeats, with lengths ranging from 27 bp to 308 bp, accounted for 1948 bp of the P. micranthum mitogenome; these repeats resided in the non-coding regions of the genome, except for a 48 bp repeat in rrn26, and some of the tandem repeats overlapped with the dispersed repeats. The mitogenome of P. micranthum possessed 89 dispersed repeats (34 types), ranging from 51 bp to 672 bp, with two to four copies and covering 9996 bp (2%) of the genome. The majority of these repeats (87 of 89, 97.7%) were intermediate-sized repeats (50 to 500 bp) and two repeats (672 bp) were large repeats (>500 bp) (Table S5), with most of these repeats residing in the noncoding regions. These repeats were distributed in 23 of the 26 chromosomes; Chr12, Chr25, and Chr26 did not contain dispersed repeats ( Figure 4A). We found 16 pairs of repeats involved in the recombination of the mitogenome structure. The alternative conformations were supported by the long reads, and repeat length was positively correlated with the recombination frequency (r = 0.9379, p < 0.001), e.g., the recombination frequency of the longest repeat (672 bp) was 0.38; 116 long reads supported the alternative conformations, and 193 long reads supported the master circle conformation, while homologous recombination occurred sporadically among repeats shorter than 300 bp ( Figure 4B,

General Features of the P. micranthum Mitogenome
The mitogenome of P. micranthum was conserved in gene number and gene content compared to other angiosperm mitogenomes, encoding for 39 of the 41 protein-coding genes present in the common ancestors of angiosperms [47]-except for sdh3 and rpl10, which were lost from the mitogenome of P. micranthum (Figure 2). Sdh3 and sdh4 encoded succinate dehydrogenase, and the two genes had been lost repeatedly in the mitogenomes of angiosperm [39]. While, in many other angiosperm lineages, both sdh3 and sdh4 are lost from the mitogenome, sdh4 was retained in the P. micranthum mitogenome. Notably, the sdh4 in P. micranthum contracted to 204 bp; the contraction of sdh4 was also observed in coconut palm (183 bp) [48], and we annotated a 222 bp of sdh4 in Asparagus officinalis (MT483994). Rpl10 has frequently been reported as lost in angiosperms [24] and pseudogenized or lost in sequenced monocots [49].
The mitogenome of P. micranthum and G. elata shared 18% (about 81 kb) of their mitochondrial DNA (mtDNA) sequences, including 37 protein-coding genes (33, 817 bp), and 47, 121 bp non-coding regions ( Figure 5). The amount of shared mtDNA was relatively small compared to most other pairs of species in seed plants [50,51]. Compared to the mitogenome of G. elata, the gene content of the two species was quite similar, and even the length of the cis-splicing introns was similar (Table S4). The GC content of the 26 chromosomes of P. micranthum was more variable compared to the 19 chromosomes of G. elata. Owing to active recombination, the ancestral gene clusters conserved across angiosperms were lost in P. micranthum; only 8 of the 14 gene clusters conserved across angiosperms were preserved. Even the two gene clusters, nad9-trnY(GUA) and trnI(CAU)-trnD(GUC), restricted to monocots, broke in the P. micranthum mitogenome. Further, there were eight gene clusters shared between the mitogenome of P. micranthum and G. elata, including atp4-ndh4L, rpl2-rps19-rps3-rpl16, rpl5-rps14-cob, rps13-nad1.x2.x3, trnY(GUA)-nad2. x3.x4.x5, matR-nad1.x1, atp1-ccmFn, and apt9-rps7; the first five of them were conserved in most angiosperms, the other three were newly formed, and atp1-ccmFn was restricted to Orchidaceae.

General Features of the P. micranthum Mitogenome
The mitogenome of P. micranthum was conserved in gene number and gene content compared to other angiosperm mitogenomes, encoding for 39 of the 41 protein-coding genes present in the common ancestors of angiosperms [47]-except for sdh3 and rpl10, which were lost from the mitogenome of P. micranthum (Figure 2). Sdh3 and sdh4 encoded succinate dehydrogenase, and the two genes had been lost repeatedly in the mitogenomes of angiosperm [39]. While, in many other angiosperm lineages, both sdh3 and sdh4 are lost from the mitogenome, sdh4 was retained in the P. micranthum mitogenome. Notably, the sdh4 in P. micranthum contracted to 204 bp; the contraction of sdh4 was also observed in coconut palm (183 bp) [48], and we annotated a 222 bp of sdh4 in Asparagus officinalis (MT483994). Rpl10 has frequently been reported as lost in angiosperms [24] and pseudogenized or lost in sequenced monocots [49].
The mitogenome of P. micranthum and G. elata shared 18% (about 81 kb) of their mitochondrial DNA (mtDNA) sequences, including 37 protein-coding genes (33,817 bp), and 47,121 bp non-coding regions ( Figure 5). The amount of shared mtDNA was relatively small compared to most other pairs of species in seed plants [50,51]. Compared to the mitogenome of G. elata, the gene content of the two species was quite similar, and even the length of the cis-splicing introns was similar (Table S4). The GC content of the 26 chromosomes of P. micranthum was more variable compared to the 19 chromosomes of G. elata. Owing to active recombination, the ancestral gene clusters conserved across angiosperms were lost in P. micranthum; only 8 of the 14 gene clusters conserved across angiosperms were preserved. Even the two gene clusters, nad9-trnY(GUA) and trnI(CAU)-trnD(GUC), restricted to monocots, broke in the P. micranthum mitogenome. Further, there were eight gene clusters shared between the mitogenome of P. micranthum and G. elata, including atp4-ndh4L, rpl2-rps19-rps3-rpl16, rpl5-rps14-cob, rps13-nad1.x2.x3, trnY(GUA)-nad2.x3.x4.x5, matR-nad1.x1, atp1-ccmFn, and apt9-rps7; the first five of them were conserved in most angiosperms, the other three were newly formed, and atp1-ccmFn was restricted to Orchidaceae. Five of the six trans-splicing introns (nad1i394, nad1i669, nad2i542, nad5i1455, and nad5i1477) were shared with the common ancestors of seed plants [24]. The trans-splicing of nad1i728 (the fourth intron) was sporadically distributed among angiosperms [24,52], and most of the species sequenced in monocots presented trans-splicing of this intron, e.g., Allium cepa [53] and A. officinalis [54], which indicates rampant recombination in the mitogenome evolution. In the P. micranthum mitogenome, the trans-splicing of nad1i728 was owed to the chromosome fragmentation, and exon4 and exon5 of nad1 were located in Chr7 and Chr5, respectively. Guo et al. [52] indicated that cis-shift to trans-splicing correlated with the rearrangement in the seed plants. The intrachromosomal transsplicing to interchromosomal trans-splicing also indicates active recombination in the mitogenome.
Interestingly, ndh genes experienced different extents of degradation in the plastome of Paphiopedilum, and ndhE, ndhF, and ndhH have been lost from the plastome of P. micranthum [62]. However, the pseudo copies of ψndhE, ψndhF, and ψndhH were detected in P. micranthum mitogenome (Figure 1, Tables 2 and 3). Additionally, ndhJ was reported as a pseudogene in the plastome of P. micranthum owing to the non-triplet, insertioninduced, premature-stop codons [62], whereas the mitogenome of P. micranthum encoded the potential functional copy of ndhJ. These data suggest that the transfer events of ndh genes predated the degradation of ndh in the P. micranthum plastome, or there was more than one donor of their plastome origin sequences. Furthermore, most of the plastome origin genes have been nonfunctional pseudogenes in previous studies [3,4,40,42], except for a few cases-for instance, psaA, ndhB, and rps7 in H. cannabinus [56] and petN, psaA, atpI, trnI-CAU, and trnC-GCA in Mangifera [63]. The mitogenome of P. micranthum contained 44 genes from plastid origin, and 12 of these genes are intact and potentially Five of the six trans-splicing introns (nad1i394, nad1i669, nad2i542, nad5i1455, and nad5i1477) were shared with the common ancestors of seed plants [24]. The trans-splicing of nad1i728 (the fourth intron) was sporadically distributed among angiosperms [24,52], and most of the species sequenced in monocots presented trans-splicing of this intron, e.g., Allium cepa [53] and A. officinalis [54], which indicates rampant recombination in the mitogenome evolution. In the P. micranthum mitogenome, the trans-splicing of nad1i728 was owed to the chromosome fragmentation, and exon4 and exon5 of nad1 were located in Chr7 and Chr5, respectively. Guo et al. [52] indicated that cis-shift to trans-splicing correlated with the rearrangement in the seed plants. The intrachromosomal trans-splicing to interchromosomal trans-splicing also indicates active recombination in the mitogenome.
Interestingly, ndh genes experienced different extents of degradation in the plastome of Paphiopedilum, and ndhE, ndhF, and ndhH have been lost from the plastome of P. micranthum [62]. However, the pseudo copies of ψndhE, ψndhF, and ψndhH were detected in P. micranthum mitogenome (Figure 1, Tables 2 and 3). Additionally, ndhJ was reported as a pseudogene in the plastome of P. micranthum owing to the non-triplet, insertion-induced, premature-stop codons [62], whereas the mitogenome of P. micranthum encoded the potential functional copy of ndhJ. These data suggest that the transfer events of ndh genes predated the degradation of ndh in the P. micranthum plastome, or there was more than one donor of their plastome origin sequences. Furthermore, most of the plastome origin genes have been nonfunctional pseudogenes in previous studies [3,4,40,42], except for a few cases-for instance, psaA, ndhB, and rps7 in H. cannabinus [56] and petN, psaA, atpI, trnI-CAU, and trnC-GCA in Mangifera [63]. The mitogenome of P. micranthum contained 44 genes from plastid origin, and 12 of these genes are intact and potentially functional (ycf4, cemA, petA, rbcL, rpoA, rpl36, ndhJ, psbE, psbF, psbJ, psbL, and atpE), which has been rather rare in previous studies (Tables 2 and 3).
Repeat sequences are a source of constant rearrangement in the mitogenome [14,[68][69][70]. Direct repeat-mediated recombination has been documented in previous studies, e.g., Brassica campestris [71] and Scutellaria tsinyunensis [27]. Li et al. [27] reported a pair of direct repeats (175 bp) mediated recombination in the mitogenome of S. tsinyunensis, and the 354,073 bp master circle was fragmented into two chromosomes with a length of 255,741 bp and 98,402 bp. According to the conventional multipartite model, large repeats in the master circle induced intragenomic recombination, resulting in a set of subgenomic circles [14]. Chr5 was fragmented into Chr5A and Chr5B due to a pair of 122 bp repeats ( Figure S1). However, we did not detect the master circle that included the entire sequence of the P. micranthum mitogenome. Notably, P. micranthum had fewer repeat sequences, both in number and relative percentage compared to other monocot species tested [5]. Further, the number of repeat sequences (2%) was relatively less than most other species, e.g., Mangifera (3.5% to 4.5%) [63], Monsonia (3.9% to 6.9%) [68], Trifolium (6.6% to 8.6%) [72], and Silene vulgaris (18.8 to 28%) [69].
Long-read sequencing provided a reliable method to explore repeat-mediated homologous recombination. Though the mitogenome of P. micranthum does not contain repeats longer than 1 kb, we found alternative conformations that coexisted in the flanking regions of repeat sequences. Further, the repeat length was strongly correlated with the recombination frequency (r = 0.9379) (Figure 4, Table S5), which has also been identified in previous studies [8,9,73]. The mitogenome of P. micranthum was depicted as 26 minicircles for simplicity. In fact, long-read sequencing implied that the mitogenome of P. micranthum consisted of a population of alternative structures that resulted from dispersed repeats. Moreover, repeat sequences might play an important role in the mitogenome fragmentation of Orchidaceae.

Genome Sequencing, Assembly, and Annotation
We collected two fresh leaf samples of P. micranthum from the National Orchid Conservation and Research Center of Shenzhen (NOCC). Total genomic DNA was extracted using the cetyltrimethyl ammonium bromide (CTAB) method [74]. The extracted total genomic DNA was used for library construction with 350 bp and 20 kb insert sizes and then sequenced on MGI2000 (MGI, Shenzhen, China) and PacBio RS-II platforms (Pacific Biosciences, Menlo Park, CA, USA) for short and long reads, respectively.
The long reads were error-corrected using Canu v2.0 [75]. Then, we used the mitogenome sequences, downloaded from GenBank, as reference sequences. The potential mitogenome long reads were filtered with BLASR v5.1 [76]; short reads were filtered with a perl script described in Wang, et al. [77], and the enriched reads were used for hybrid assembly in SPAdes v3.14.1 [78]. Mitogenome contigs were filtered using BLASTN [79] and used as reference sequences for further analysis. We repeated the above steps for multiple rounds in SPAdes to improve the assembly.
In parallel, we used the complete, uncorrected datasets to assemble the mitogenome with an unpublished hybrid assembly version of NOVOPlasty [80,81]. As a seed-andextend assembler, it needs a mitochondrial seed to initiate the assembly. Since this mitogenome exists out of multiple circular genomes, we selected all the protein-coding genes shared among angiosperms as seed sequences. Furthermore, we used the mitochondrial contigs from the SPAdes assembly that were devoid of genes as additional seeds. The overlapped regions of the above-described methods are identical, and contigs obtained from SPAdes usually lost some parts of the circular genome. In addition, we mapped the long PacBio reads to these contigs to verify the results and we de novo assembled the lost genes (rpl10 and sdh3) with NOVOPlasty to confirm their absence.

Identification of Plastid-Derived Regions and Other Horizontally Derived Regions
Firstly, the mitogenome of P. micranthum was searched against the plastomes of P. micranthum (MN587791) [62] and C. tibeticum (MT937101) [85] to identify plastid-derived fragments with BLAST v2.11.0+ [79], using a word size of seven, an E-value cutoff of 1 × 10 −6 , and a length > 100 bp. The paralogs in the mitogenomes and plastomes were excluded from the results (e.g., atp1/atpA, rrn26/rrn23, and rrn18/rrn16), following the procedures in Guo et al. [51]. Additionally, we compared the mitochondrial homologs with putative plastid regions to evaluate the mutations in the plastid-derived mitochondrial genes. Then, we use the mitogenome of Ustilago maydis as reference sequences to identify the horizontal gene transfer fragments mentioned in Sinn and Barrett [31].

Repeat and Repeats-Mediated Homologous Recombinations
Tandem repeats in the P. micranthum mitogenome were identified using Tandem Repeat Finder v4.09 [86] with default parameters. The dispersed repeats were detected using the python tool ROUSFinder.py [5] with a minimum repeat size of 50 bp. Then, we calculated the recombination frequency of 34 pairs of repeats with 100% identity, following the methods of Sullivan et al. [70]. For each repeat pair, we extracted ±2000 bp flanking regions and constructed two potentially alternative conformations. The recombination rate was calculated by dividing the number of recombinant reads by the total number of reads spanning each repeat. In addition, we tested whether the repeat length correlated with the recombination frequency.

Conclusions
We accurately assembled the mitogenome of P. micranthum with a combination of longand short-read data. The mitogenome of P. micranthum presents typical multichromosomal structures and preserves a large amount of plastome-derived horizontal gene transfer fragments. Considering the genome size and chromosome number, the mitogenome of P. micranthum is more fragmented than most other species with multichromosomal genome structures. The long reads provide strong evidence for the plastome-to-mitogenome intracellular gene transfer and the repeat-mediated homologous recombination. The comparison of the P. micranthum mitogenome with the mitogenome of G. elata sheds light on the mitogenome evolution of Orchidaceae. Though the mitogenomes of the two species have similar gene content, the mitogenomes of the two species share only 81 kb of their mtDNA sequence. Considering the disparities in genome size and chromosome number, the high frequency of recombination, intraspecies genome structure variation, and the low collinearity of the two orchids, our understanding of the mitogenome evolution of orchids is rather limited. Further studies are needed to unravel the mitogenome evolution of orchids.